46

Algorithms for Binary Neural Networks

2

3

4

Number of clustering centers

80

85

90

95

MCNs without center loss

MCNs with center loss

FIGURE 3.4

Accuracy with different numbers of clustering centers for 20-layer MCNs with width 16-16-

32-64.

with a batch size of 128. Using different values of θ, the performance of MCNs is shown in

Fig. 3.7. First, only the effect of θ is evaluated. Then the center loss is implemented based

on a fine-tuning process. Performance is observed to be stable with variations θ and λ.

The number of clustering centers: We show the quantization with U = 2, 3, 4 denoting

the numbers of clustering centers. In this experiment, we investigate the effect of varying

the number of clustering centers in MCNs based on CIFAR-10.

The results are shown in Fig. 3.4, where accuracy increases with more clustering centers

and center loss can also be used to improve performance. However, to save storage space

and to compare with other binary networks, we use two clustering centers for MCNs in all

the following experiments.

Our binarized networks can save storage space by 32 in convolutional layers compared

with the corresponding full-precision networks, where 4 bytes (32 bits) represent a real

value. Since MCNs only contain one fully connected layer that is not binarized, the storage

of the whole network is significantly reduced.

The architecture parameter K: The number of planes for each M-Filter, i.e., K, is also

evaluated. As revealed by the results in Fig. 3.5, more planes in each M-filter involved in

reconstructing the unbinarized filters yield better performance. For example, when increas-

ing K from 4 to 8, the performance is improved by 1.02%. For simplicity, we choose K = 4

in the following experiments.

The width of MCNs:

CIFAR-10 is used to evaluate the effect of the width of Wide-

ResNets with MCNs. The accuracy and number of parameters are compared with a recent

binary CNN,

LBCNN. The basic width of the stage (the number of convolution kernels

per layer) is set to 16163264. To compare with LBCNN, we set up 20-layer MCNs

with basic block-c (in Fig. 3.9), whose depth is the same as in LBCNN. We also use other

network widths to evaluate the effect of width on MCNs.

The results are shown in Table 3.1. The second column refers to the width of each layer

of the MCNs, and a similar notation is also used in [281]. In the third column, we give the

parameter amounts of MCNs and the 20-layer LBCNN with the best result. The fourth

column shows the accuracy of baselines whose networks are trained based on the Wide-

ResNets (WRNs) structure with the same depth and width as the MCNs. The last two